English , Devnagari and Urdu Text Identification

نویسنده

  • U. Pal
چکیده

In a multi-lingual multi-script country like India, a single text line of a document page may contain words of two or more scripts. For the Optical Character Recognition of such a document page it is necessary to identify different scripts from the document. In this paper, an automatic technique for word -wise identification of English, Devnagari and Urdu scripts from a single document is proposed. Here, at first, the document is segmented into lines and then the lines are segmented into possible words. Using characteristics of different scripts, the identification scheme is developed.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Offline Handwritten Script Identification in Document Images

Automatic handwritten script identification from document images facilitates many important applications such as sorting, transcription of multilingual documents and indexing of large collection of such images, or as a precursor to optical character recognition (OCR). In this paper, we investigate a texture as a tool for determining the script of handwritten document image, based on the observa...

متن کامل

Convolution Based Technique for Indic Script Identification from Handwritten Document Images

Determination of script type of document image is a complex real life problem for a multi-script country like India, where 23 official languages (including English) are present and 13 different scripts are used to write them. Including English and Roman those count become 23 and 13 respectively. The problem becomes more challenging when handwritten documents are considered. In this paper an app...

متن کامل

Recognition of Handwritten Devnagari Numerals with Svm Classifier

Natural language processing is a field of science and linguistics concerned with the interaction between computers and human languages. Natural language generation systems convert information from computer databases into readable human language. The term “natural” language refers to the languages that people speak, like English and Japanese and Hindi, as opposed to artificial languages like pro...

متن کامل

AGHAZ: An Expert System Based approach for the Translation of English to Urdu

–Machine Translation (MT ) of English text to its Urdu equivalent is a difficult challenge. Lot of attempts has been made, but a few limited solutions are provided till now. We present a direct approach, using an expert system to translate English text into its equivalent Urdu, using The Unicode Standard, Version 4.0 (ISBN 0-321-18578-1) Range: 0600–06FF. The expert system works with a knowledg...

متن کامل

Language Engineering System for Automatic Conversion of English Cyber Data into Urdu Websites

English is one of the most widely spoken languages in the world these days. Most of the commercial websites are also being designed in English Language. Modern software engineering trends supports better interfaces for effective Human Computer Interaction (HCI). One of the major HCI requirements is to provide data in human readable format. All people can not get benefits of cyber information wh...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005